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Abstract: Our problem is to find a good approximation to the P-value of 
the maximum of a random field of test statisties for a cone alternative at 
each point in a sample of Gaussian random fields. These test statistics have 
been proposed in the neuroscience literature for the analysis of fMRI data 
allowing for unknown delay in the hemodynamic response. However the null 
distribution of the maximum of this 3D random field of test statistics, and 
hence the threshold used to detect brain activation, was unsolved. To find a 
solution, we approximate the P-value by the expected Euler characteristic 
(EC) of the excursion set of the test statistic random field. Our main result 
is the required EC density, derived using the Gaussian Kinematic Formula. 
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1. Introduction 

It seems appropriate to begin this paper witli a tribute to the paper's second 
author, Keith Worsley, for whom this appears posthumously. This paper is to 
appear in a volume celebrating David Siegmund's 70th birthday. David and 
Keith Worsley had worked together several times over their careers Siegmund 
and Worsley (1995); Shafie et al. (2003), most often at the intersection of their 
two interests: the distribution of the maximum of random fields. While David's 
interests range from the smooth to the non-smooth case, Keith was most inter- 
ested in smooth random fields and their application to brain imaging Worsley 
(1994); Friston et al. (1995); Worsley et al. (1996). This paper represents Keith 
Worsley's last work, before he passed away prematurely from pancreatic cancer 
in February 2009. Keith and the first author had discussed this paper right up 
to a few days before he passed away. 

•Supported in part by NSF grant DMS-0906801. 

tKeith Worsley, friend, mentor and colleague passed away February 27, 2009. 
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David has considered two main approaches to such problems: Weyl's volume 
of tube formulas as in Johnstone and Siegmund (1989); Knowles and Siegmund 
(1989) and change of measure approaches as in Nardi et al. (2008). On the 
other hand, Keith preferred using the expected Euler characteristic (EC) ap- 
proach of Adler (1981) and his generalizations Worsley (1995b). In this paper, 
we combine the EC approach to the volume of tube formula via the Gaussian 
Kinematic Formula (Taylor, 2006) which expresses Keith's EC densities in terms 
of coefhcients the Gaussian measure of a tube. Referring back to David Sieg- 
mund's approach to these problems, these coefficients are also coefficients in an 
expansion of their own change of measure formula on Gaussian space (Taylor 
and Vadlamani, 2011). 

This paper is concerned with the maxima of (functions of) smooth Gaussian 
random fields. Let T{s), s € be a random field, and let S C be a fixed 
search region. Our main interest is to find good approximations to the P-value 
of the maximum of T(s) in S: 



The random field T{s) will be one of a variety of test statistics for a cone 
alternative in a multivariate Gaussian random field. Two of these test statistics 
have been proposed in the neuroscience literature (Friman et al., 2003; Calhoun 
et al., 2004) but without a P-value (1). Worsley and Taylor (2006) gives a 
heuristic approximation to the P-value of the Friman et al. (2003) statistic. 
This has been incorporated into the R package fMRI (Polzehl and Tabelow, 
2006) . This paper aims to give a correct P-value approximation to both of these 
test statistics and the likelihood ratio test statistic. 

To do this, we first define the test statistic random fields in Section 2, then 
evaluate their approximate P-values (1) in Section 3 using the EC heuristic and 
the Gaussian kinematic formula. Section 3 concludes with a subsection that 
relates our methods to those we have used for the Hotelling's random field 
(Taylor and Worsley, 2008). Finally in Section 4 we apply our methods to the 
re-analysis of an fMRI data set already used for the same purpose in Worsley 
and Taylor (2006). 

2. The test statistics 

2.1. Definitions of the test statistics 

The test statistics are defined as follows. Let Z{s) — (Zi(s), . . . , Z„(s))', s € 
S C M^, be a vector of n i.i.d. Gaussian random fields with 



Usually cr(s) is unknown and must be estimated separately at each point, so 
keeping this in mind, we will set cr(s) = 1 without loss of generality. Let U C 



P max T(s) > t 





(1) 



E(Z(s))=A^(,s), YiZis))^a{sfl, 
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O" ^, the unit {n — l)-sphere. At each s E S, we are interested in testing that 
the mean is zero against the cone alternative: 

Ho : ^J.{s) = vs. Hi : ^(s) G Cone(C/) = {c ■ u : c > 0, u £ U} 

(Robertson et aL, 1988). The hkehhood ratio test of Hq vs. Hi is equivalent to 

x{s)^maxu'Z{s), (2) 

which we call the x random field because it has a so-called x rnarginal distri- 
bution when Cone(t/) is convex (see Section 2.3 below). As mentioned above, 
(t(s) is usually unknown so the x random field must be normalized separately 
at every point s. We shall consider two ways of doing this. 

The first is the likelihood ratio cone random field, equivalent to the likelihood 
ratio of the cone alternative under unknown variance: 

T.k(.) - ^^'^ 



^i\\ZisW-xisr)/n 

or equivalently, the maximum correlation between a point in the cone and the 
data. The second, proposed by Friman et al. (2003), is only defined if f/ is a 
subset of some fc-dimensional subspace of E" , in which case there are effectively 
V = n — k residual degrees of freedom which can be used to estimate (7(s) and 
normalize x{s)- Suppose Z±{s) is the projection of Z{s) onto the orthogonal 
complement of the linear span of C/, so that Zi_{s) is independent of x{s) and 
has mean under Hi . Then the independently normalized cone random field is 

rr ( \ _ ^(«) 

^INlSj - II ^ / N|| / /-■ 

\\Z^{s)\\/Vi' 

Note that \iU = O^^^ (by this we mean a (fc— l)-sphere embedded in M") then 
the two cone random fields are both equivalent to the F-statistic random field 

\\Zj{s)llk 

where Z-f{s) is the projection of Z{s) onto the linear subspace spanned by U. 



2.2. Power and maximum likelihood 

Both cone statistic random fields should be more powerful than the F-statistic 
random field since the F-statistic wastes power on alternatives that are outside 
the cone. The one-sided F-statistic tries to make up for this, but it is inad- 
missable (for infinite v and fixed s) because its acceptance region is concave 
(Birnbaum, 1954) - see Figure 1 - although it is not clear how to construct a 
test which dominates it. If in fact the alternative is at the middle of the cone 
then Ti should be the most powerful. 
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Fig 1. Rejection regions (the side of the boundary that excludes the origin) of the test statistics 
at P = 0.05 with infinite sample size for a 2D (k = 2) right-angled cone alternative covering 
the first two components Zi,Z2 of Z. The middle of the cone u is parallel to the Z\ axis. 
The cone can also be expressed as a linear model with m = 2 regressors xi and X2 with 
non-negative coefficients /3i > and 02 > 0. The x statistic is the length of the projection 
of Z onto the nearest edge of the cone (including the vertex of the cone and the interior of 
the cone itself). The null distribution of x is a mixture of Xj random variables with weights 
Pj = P(#{/3's > 0} = j) equal to the relative size of the shaded regions: po,l,2 = 1/4, 1/2, 1/4. 
The statistic is the one-sided F statistic of Calhoun et al. (2004)- 

Between the two cone statistics, the advantage of Tlr(s) is that it uses all 
the information in the data to estimate the variance and so it should be more 
powerful than Tin(s). Cohen and Sackrowitz (1993) show that T'lr(s) is admis- 
sible in specific examples, whereas Tin(s) is always inadmissable. However if in 
fact the mean is outside the cone but still inside the linear subspace spanned 
by [/, then we would expect Tin(s) to be more powerful. The reason is that a 
mean yLt(s) outside the cone would increase the denominator of T'lr(s) but not 
that of Tin(s). Friman et al. (2003) chose the more conservative riN(s). This 
strategy sacrifices a few degrees of freedom and a small loss of power if /Lt(s) 
really is in the cone, against a much larger loss of power if it is not. Worsley 
and Taylor (2006) investigates power in an fMRI application that we shall also 
use in Section 4. For a general discussion of power and likelihood ratio tests in 
this setting see Perlman and Wu (1999). 

We note in passing that we have used maximum likelihood principles only 
at a single point s, not over the whole space iS, which would require a spatial 
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model for the mean and covariance function of the random fields. In the case 
of known ct(s), a standard reproducing kernel argument, discussed in Siegmund 
and Worslcy (1995), can be used to show that if each of the components of /i(s) is 
proportional to the spatial correlation function centered at some unknown point 
So (which is assumed to be the same for each component), then maxggg xi^) is 
the likelihood ratio test statistic. 

Our interest is confined to s in a search region 5 C M^, where we expect Hq 
to be true at most points, with only a sparse set of points where Hi is true. 
This suggests that we should estimate by thresholding the above test statistic 
random fields at some suitably high threshold. Choosing the threshold which 
controls the P-value of the maximum of the random field to say a — 0.05 should 
be powerful at detecting Si , while controlling the false positive rate outside Si 
to something slightly smaller than a. Our main problem is therefore to find the 
P-value of the maximum of these random fields of test statistics (1), which is 
the main aim of this paper. 

2. 3. Mixture representation of x 

The X random field is so-named because it has a useful representation in terms 
of a mixture of Xj random fields with j degrees of freedom (Lin and Lindsay, 
1997; Takemura and Kuriki, 1997). The mixture representation works when 
Cone(C/) is convex and polyhedral, and asymptotically when Cone(t/) is only 
locally convex (see Section 3.2 below). The simplest way of seeing where the 
polyhedral cone enters the picture is to write it as a linear model with non- 
negative coefficients: 

m 

Hi: fi{s) = J2xjPj{s), /3i(s),...,/3™(s)eM+. (3) 
i=i 

The regressors xi,...,Xm G IR" contain the vertices of U (times arbitrary 
scalars), and they may be linearly dependent (see Figure 1). The cone may 
even contain linear subspaces (for instance, take X2 = —xi above) which effec- 
tively corresponds to having a certain number of unrestricted coefficients in /i(s) 
under Hi. 

To actually compute the x(s) random field, one must solve a convex problem 
at each location s. This can be done in several ways: the most direct is to first 
perform all-subsets least-squares regression, then throw out any fitted model 
that has negative coefficients. Amongst those that are left, the model that fits 
the best, with fitted values 

m 

Z(s)=/2(s)=^x,^,(s), A{s),...Jm{s)eR+, (4) 
is the maximum likelihood estimator of /i(s), and x(s) = ||Z(s)||. Alternatively, 
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one may solve the problem 

minimize \\Z - X/3(s)\\l subject to /3»(s) > 0, 1 < i < m, s e S". (5) 

This is is a collection of separable convex problems, each of which can be solved 
via coordinate descent Friedman et al. (2007) or first-order methods (c.f. Becker 
et al. (2009)). As the inputs are smooth, one would expect that warm starts at 
adjacent locations would greatly speed up the convergence of such algorithms. 
There is a huge literature on such non-negative least squares (NNLS) problems, 
with many applications in inverse problems, and many faster algorithms than 
all-subsets regression, such as the classic one by Lawson and Hanson (1995). 

From a geometric perspective, estimation of /i(s) is equivalent to projecting 
Z(s) onto Cone(J7), i.e., finding the face of Cone(C/) closest to Z{s). Here, a face 
of Cone([/) could represent the vertex of Cone(J7), in which case Z{s) = 0; an 
edge of Cone([/); or even the interior of Cone(J7), in which case Z{s) = Z{s). 
Let A C Cone(C/) represent a generic face of Cone(J7). Further, let Za{s) be 
the projection of Z{s) onto the linear subspace spanned by A, so that {Za{s) € 
Cone(J7)} is the event that the non- negativity restrictions are satisfied for face 
A. Then, 

X{s) = maxl^g^(^)gc^^^(y)j • ||Z^(s)||, (6) 

and let A{s) be the value of A that achieves this maximum. Actually, there 
are values of Z(s) for which more than one face achieves the maximum above, 
though these occur on lower dimensional subsets of R", which correspond to 
lower dimensional surfaces in the search region S. From (6), it is clear that 

Xi^)-T.hA(s)=A}-\\ZAm. (7) 
A 

Clearly, ^ 

X{s)\{A{s) ^ A} ^ Xdim(A), 

which only depends on the dimensionality of A, and so 

x{s)\{dim{A{s)) = j} - Xj- 
Hence its unconditional marginal distribution is a mixture of Xj's 

n 

nx{s)>t)^Y.p^{u)nxj>t) (8) 

3=0 

with weights 

pj{U) ^¥ (diin{A{s)) = j'^ , 0<j<n. 

These weights are the probability that the face of Cone(J7) that is closest to Z 
has dimension j, or, in terms of the fitted linear model (4), 

Pj ([/) = P (#{^'s > 0} = j) , < J < n. 
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Above, we define xq = to be a constant random variable which corresponds 
to Z{s) being closest to the vertex of Cone(C/). Depending on the structure of 
Cone(C/), one or more of the pj(C/)'s may be zero. More specifically, let L{U) 
be the largest linear subspace contained in Cone(J7) with L{U) possibly equal 
to 0, the subspace containing only the vector. It is not hard to see that 

1{U) = dim(L(C/)) = min{j : pj{U) > 0} 

and further, 

\\ZLiuM\<xis)<\\Z{s)\\. 

Finally, we also note that, for t > 0, P(xo > t) — so effectively the sum in (8) 
is really a sum over 1 < j < n and we can generally ignore po{U) which we do 
in later expressions for the EC densities of Tin(s) and Tlr(s). 

By approximation, this argument extends to general convex cones, though 
the pj 's have slightly different interpretations even though they are limits of the 
Pj^s of the polyhedral approximations, see Section 3.2 below (Lin and Lindsay, 
1997; Takemura and Kuriki, 1997). 

Note that while the marginal distribution of the x(s) random field is a mixture 
of Xj random variables, it is not strictly a mixture as a random field. Rather, 
realizations of the random field resemble a patchwork of Xj random fields with 
patches {s : A{s) = A} on which we observe ||Z^(s)|| ~ Xdim(^) (see Figure 2). 

This representation also sheds some light on the two normalized random fields 
2lr(s) and Tin(s) as patchwork mixtures oi V F random fields of appropriate 
degrees of freedom. In terms of the representation (7), it is not hard to see that 

Above, some slight care must be taken at points s contained in the intersection 
of the closure of two or more patches. For these points, we can arbitrarily assign 
A{s) to any appropriate face of Cone(C/). The representation (9) shows immedi- 
ately that its marginal distribution is that of a mixture of \/ jn/{n — j) ■ Fj ^-j 
random variables with weights Pj{U). As in the xo case, we define Fg,/ = to 
be a constant random variable for all I. For the independently normalized cone 
random field ^ 

which shows that its marginal distribution is a mixture of \fY^~Fj^ random 
variables with weights pj{U). 



2.4- Dimensionality 

The representation of Tin(s) and Ti,^{s) as patchwork mixtures of \/F random 
fields shows that we must consider constraints on D dictated by the total degrees 
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Fig 2. Examples of n = 3 Gaussian random fields in D = 2 dimensions (top row). Bottom 
row: the random fields Tlr, Tin o.'fid, for the same quarter circle cone as in Figure 1, 
so that k = 2 and u = 1. In the three patches the x random fields are Xj fields with j = 
dimensionality of the nearest cone face. In the gray patches, j = 0, Tlr = Tjn = .F-^. = 0/ 
in the medium shaded patches, j = 1, ^1,2 and T^^ ~ ^1,1/ unshaded patches, 

j = 2, T^j^ = = ~ -F2,l (times scalars). The boundary between the medium shaded and 
unshaded patches (heavy black line) is the edge of the cone, x\ orx2. When the denominator 
has one degree of freedom, the statistic takes the value oo on random curves; when it has 
two degrees of freedom, it takes the value oo only at the points where these curves touch 
the boundary. Tin "o* defined everywhere because it takes the value 0/0 at random points 
(arrow). 
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of freedom n and Cone(J7) (see Figure 2). For the F random field, recalling the 
argument in Worsley (1994), we note that the set where ||Z(s)|| takes the value 
zero is the intersection of the zero sets of each of the components of Z{s), so its 
dimensionality is D — n if I? > n or empty if D < n. This means that if D > n 
then F{s) = 0/0 with positive probability somewhere inside 5*, in which case 
F{s) is not defined. Hence we must have D < n for F{s) to be well defined. The 
same argument applies to -F+(s) and to Ti{s) for which we must have D < ly+l. 

By a similar argument, Ti^n(s) is made up of ^ Fj^^-j random fields for 
'(f^) < j < so we must have D < n to avoid 0/0 for such random fields. A 
similar argument applies to riN(s) though the limit on the dimension is more 
restrictive and slightly more difficult to describe. In principle, we simply want 
to avoid 0/0 for the random field Ti^i^s). However, when 1{IJ) = 0, we can allow 
some isolated 0/0 points within the interior of the patch {s : A(s) = 0}, i.e. when 
the numerator of Tin(s) is 0. If we allow more than isolated points, say curves 
of 0/0, these will generally intersect the boundary of the patch {s : A[^s) — 0} 
causing Tin(s) to be undefined at such points (see the white arrows in Figure 
2(a,b)). In other words, we really need to avoid 0/0 on the closure of the set 
{s : A{s) ^ 0}. When 1{U) = 0, on this set 

min{||Z4(s)|| : dim(A) = 1} < x(s) < ||^(s)|| 

therefore there will be no 0/0's if there are no 0/0's for any of the Fi^^ random 
fields 

{^-»<^)-). 

that is, if D < J' + 1. However, if 1{U) > 0, then {s : A{s) = 0} is of strictly 
lower dimension than D and even isolated 0/0 points within this patch will 
cause Tin(s) to be undefined, hence we must again avoid 0/0's in the closure of 
{s : A(s) 7^ 0} which is just the entire search region. As noted in the previous 
section, when 1{U) > 

\\ZLiU)m<x{s)<\\Z{s)\\ 

and there will be no 0/0's in Tin(s) if there are no 0/0's in the Fkjj-^ i, random 
field 

that is, if D < i/ + 1{U). In summary, considering both cases 1{U) = and 
1{U) > 0, we must have D < i' + max(Z([/), 1). 

When Cone(J7) is non-convex, the situation is more difficult to describe in 
exact terms for both Tii^{s) and Ti^nis). If Cone(J7) is non-convex, then the 
marginal distribution of xi^) is no longer exactly a mixture of Xj's with the 
error being exponentially small Taylor et al. (2005). 
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Obs-arvad 




Fig 3. The Euler characteristic (EC) of excursion sets of the Gaussian random field Z\ 
from Figure 2 plotted against threshold t, together with the expected EC under Hq from (11). 
Bottom row: the excursion sets (light gray) for t = —2, . . . , 3; the search region S is the whole 
image. At high thresholds the expected EC is a good approximation to the P-value of the 
maximum (arrowed). The approximate P = 0.05 threshold is t = 3.57 (arrowed). 



3. P-value of the maximum of a random field 

A very accurate approximation to the P-value of the maximum of any smooth 
isotropic random field T{s), s G S C M^, at high thresholds t, is the expected 
Euler characteristic (EC) ip of the excursion set: 

P (maxT(s) > t \ ^ E(</.{s e S : T{s) > t}) - ^ CdiS)pdit), (11) 
^ ^ d=0 

where Cd{S) is the d-dimensional intrinsic volume of S (defined in Appendix A), 
and Pdit) is the d-dimensional EC density of the random field above t (Adler, 
1981; Worsley, 1995a; Adler, 2000; Adler and Taylor, 2007). The heuristic is that 
for high thresholds the EC takes the value or 1 if the excursion set is empty 
or not, so that the expected EC approximates the P-value of the maximum (see 
Figure 3). The approximation is extraordinarily accurate, giving exponential 
accuracy for Gaussian random fields (Taylor et al., 2005). A difi'erent approach 
using volumes of tubes (Knowles and Sicgmund, 1989; Johanscn and Johnstone, 
1990; Sun, 1993; Sun and Loader, 1994; Sun ct al, 2000; PiUa, 2006) is, in our 
context, essentially the same as the methods used here, as shown by Takemura 
and Kuriki (2002). 

For £) = 3, our main interest in applications, £0,1,2,3(5') are: the EC, twice the 
'caliper diameter', half the surface area, and the volume of S respectively (for a 
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convex set, the caliper diameter is the average distance between the two parahel 
tangent planes to the set). If the random field T{s) is a function of Gaussian 
random fields, such as all the test statistic random fields considered so far, and 
these Gaussian random fields are non-isotropic, then it is only necessary to 
replace intrinsic volume in (11) by Lipschitz-Killing curvature. Lipschitz-Killing 
curvature depends on the local spatial correlation of the component Gaussian 
random fields, as well as the search region S (Taylor and Adler, 2003; Taylor 
and Worsley, 2007). 



Morse theory can be used to obtain the EC density of a smooth random field 
T = T{s) as 



where dot notation with subscript d denotes differentiation with respect to the 
first d components of s (Worsley, 1995a). For d = 0, po(t) = P{T > t). The 
Morse method of obtaining EC densities, though straightforward in principle, 
usually involves an enormous amount of tedious algebra. Entire papers have 
been devoted to evaluating (12) for an ever wider class of random fields of test 
statistics such as Gaussian (Adler, 1981), x^, T, F (Worsley, 1994), Hotelling's 
(Cao and Worsley, 1999b), correlation Cao and Worsley (1999a), scale space 
(Siegmund and Worsley, 1995; Worsley, 2001; Shafie et al., 2003) and Wilks's 
A (CarboncU and Worsley, 2007). A much simpler method is given in the next 
section. 

3.1. The Gaussian Kinematic Formula 

There is a much simpler way of getting EC densities when T is built from inde- 
pendent unit Gaussian random fields (UGRF). A UGRF is a Gaussian random 
field with zero mean, unit variance, and identity variance of its spatial deriva- 
tive. Note that any stationary Gaussian random field can be transformed to a 
UGRF by appropriate linear transformations of its domain and range. Without 
loss of generality we shall assume that all the random fields considered so far 
are built from UGRFs. 

This simpler method is based on the Gaussian Kinematic Formula discov- 
ered by Taylor (2006). The idea is to take the Steiner-Weyl volume of tubes 
formula (24) and replace the search region by the rejection region, and volume 
by probability. Somewhat miraculously, the coefficients of powers of the tube 
radius are (to within a constant) the EC densities we seek. 

The details are as follows. Suppose T(s) = f{Z{s)) is a function of UGRFs 
Z{s) = (Zi(s), . . . , Z„(s))'. Put a tube of radius r about the rejection region 
Rt = {z G K" : f{z) > t} C K", evaluate the probability content of the tube 
(using the N„(0, /„xn) distribution oi Z = Z[s)), and expand as a formal power 
series in r. Denoting the tube by Tube(i?t, r) = {x : min^g/j^ ||z — a;|| < ?'}, then 




(12) 



OO 



4 




(13) 
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Since the spatial dependence on s is no longer needed, we omit it until further 
notice. 

For example, let f{z) — u'z for fixed u with ||u|| = 1 so that T is a UGRF. 
Without loss of generality we can assume that n = 1 and hence /(z) — z. It is 
easy to see that Rt = [t, +oo) and further 

Tube(i?t, r) = [t - r, +oo) = Rt-r- 

This observation leads directly to the EC density of the Gaussian random field 

We shall exploit this observation, that the tube is another rejection region but 
with a lower threshold, to derive the EC density for the x random field in the 
next section. 

3.2. The X random field 

Now let Rt C M" be the rejection region for the x random field at level t. This 
rejection region is the union of half planes all a distance t from the origin. It is 
clear that a tube of radius r about such a rejection region is simply another union 
of half planes all a distance t — r from the origin (provided r <t). We thus arrive 
at precisely the same expression as for the Gaussian case: Tube(i?t,r) = Rt-r- 
In exactly the same way, this leads directly to the following representation for 
the EC densities of a x random field: 

We can now use the mixture representation (8) to show that the EC density of 
X is the same mixture of EC densities of the Xj random field. To see this, note 
that, by setting U — O^^^ in (15), the EC density of Xj is 

p^(i;j) = (^|)'iP(x,><). (16) 

Combining this with (15) and (8) leads to the first expression of the following 
Theorem. 

Theorem 1. // Cone{U) is convex then the EC density of the x random field 
is 

n n—1 

where p^{t;j) and p^{t) are the EC densities of the the Xj random field (16) 
and Gaussian random field (14), respectively - 
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The second part of the Theorem is proved as foUows. Another way of evalu- 
ating P(x > t) is to note that u' Z, as a function of u, is a UGRF and that x 
is its maximum over U. Hence we can use the approximation (11) for Gaussian 
random fields, replacing S by U. This is exact for t > when Cone(C/) is con- 
vex. The reason is that the excursion set {u G U : u' Z > t} generates a cone 
that is the intersection of a convex circular cone (provided t > 0) with convex 
Cone{U), which is again convex. The EC oi {u G U : u'Z > t} is either or 1 if 
it is empty or not, that is, if x is less than or greater than t. Hence the expected 
EC is the P-value, so that (11) is exact and gives 

p(x>t)-E^^(t^vfw- (17) 

Combining this with (15) yields the second expression of Theorem 1. Note that 
the weights Pj{U) can now be expressed in terms of intrinsic volumes by equating 
(17) to (8) to give 

an 1 L^"^^ ^-i)-(d + 2m)! , 

(see Chapter 15 in Adler and Taylor (2007)). 



Remark 1: If Cone(C/) is not convex, the above argument used to derive (17) 
fails, though (15) still holds for the coefhcients in the exact tube expansion, in 
the sense that Tube(i?t,r) = Rt-r- However, if Cone(C/) is locally convex (17) 
is exponentially accurate Taylor et al. (2005) and therefore the right hand side 
of the result in Theorem 1 is the EC density up to an exponentially small error. 

Remark 2: The representation (7) represents x(s) (reinstating dependence on 
s) as a mixture of Xj{s) random fields with weights pj{U). It is therefore not 
surprising that the EC density of the x(s) random field is a mixture of the EC 
densities of Xj{s) random fields with the same weights. We give a sketch of a 
proof why this should be so for the simplest cone: the positive orthant in R'^ 

k 

J=l 

For this cone, a face is determined by a subset of {1, . . . , fc} which are the set 
of non-negative components of /i(s). It is not hard to see that A{s) = {j : 
Zj{s) < 0}'^ with the empty set representing the vertex of the cone. We shall 
now make use of Morse theory, which shows that the EC of a set is determined 
by the critical points of a twice differentiable Morse function defined on the set 
(Adler, 1981). The Morse theory expression for the EC density (12) is obtained 
by using the random field itself as the Morse function (Worsley, 1995a). The 
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random field x{s) as a Morse function is actually differentiable (though not twice 
differentiable) and it is not hard to show that its critical points are almost surely 
contained in the interior of the patches. This is because the critical points on the 
boundary are points where a particular Xji^) random field has a critical point 
and one or more components are (see Figure 2). For instance, critical points 
that appear on the segment of boundary of the intersection of {s : Zi{s) = 0} 
and the patch {s : A{s) = 0} are points where Zi{s) has a critical point and 
Zi{s) — 0. The number of such points is almost surely 0. Because there are no 
critical points on the boundary of the patches, we can redefine xi^) near these 
boundaries to get a Morse function with the same critical points as x(s) and 
the standard Morse-theoretic computation of the expected EC now shows that 
for each patch J C {1, . . . , /c} we must find the number of critical points of 
Xj(s)^ = ^j^j Zj{s) above the level t, counting multiplicities. The expected 
EC above the level t, similar to (12) will therefore be 

E ^ {^{Ais)=j}hxAs)>t}deti-XjAs)) I XjA^) = O) P(xj,rf(s) = 0). 

JC{1,...,N} 

Noting that the conditional distribution of xj,d(s) given {Z(s), Z{s)) depends on 
Z{s) only through ||Zj(s)|| implies that x./.ti(s) and are conditionally 

independent given {Z{s), Z{s)). In fact, this also implies that they are actually 
unconditionally independent. This completes the sketch of the proof: the sum 
over all subsets J of size j yields Pj{U) times the EC densities of x'j random 
fields from (12). To go from the x(s) to the Tin(s) or Tlr(s) random field is 
not complicated: simply replace xj above by the appropriate F random fields in 
the decomposition (9) or (10), though the conditional independence argument 
is just slightly more complicated. In the following sections, we prefer to use the 
Gaussian kinematic formula to give a more direct and complete proof which 
does not refer to Morse theory and counting critical points. 



3.3. The F- and T-statistic random fields 

Our main results, stated in Theorem 2 and Theorem 3, are based on a simple 
refinement of Theorem 1 in which we incorporate a x^ field in the denominator. 
To see how it works, let us use the Gaussian kinematic formula to derive the 
EC density of the F-statistic field. Let Rt C K" be the rejection region of the F- 
statistic random field F with k, v degrees of freedom. Without loss of generality, 
setting z = (zi, . . . , z„), we can take 



Then, a little elementary geometry (see Figure 4) shows that 

P (Z e Tube(i?t, r)) = V{xk > T,) + 0(r") (18) 



imsart-generlc ver. 2011/11/15 file: cone-ims.tex date: July 18, 2012 



Taylor & Worsley/Detecting sparse cone alternatives 



15 



where 

[tk I tk 

The remainder above reflects the fact that the tube Tube(i?t,r) is almost equal 
to the event {xk > 7r}. Near the origin, this fails but the probability content 
of where this fails is of order 0(r"). Further, the EC densities of F are only 
defined for d < D < n (as explained in Section 2.4). Continuing with the main 
term in (18), and making use of (14), 



= E|^/:,(o^-i)pf(r.) 

d/2 fe-1 



(19) 

Hence, the EC densities for an F-statistic random field with k, v degrees of 
freedom are given by 




(20) 




For the T-statistic random field Ti, a similar argument to that leading to 
(18) shows that we must expand the following probability in a power series: 



Zi>X 



where Z\ ~ A''(0, 1) is independent of Xv In the above expression, t^ appears 
instead of t because is an F\^i, random field and Z\ appears rather than 
Xi = \Z]\ on the left hand of the inequality side because Ti is one-sided. Similar 
calculations to those above for the F-statistic yield the following expression for 
the EC densities of the T-statstic random field 



pl{t-v)=[\^^^ P.{(PAx.^^^ 



_ (-l)'(d-l)!r( '^-l-/'+'- ) /t2 \ (rf-l-2/)/2 X -(--1 

for d > and P(Ti > i) for d = 0. This is simpler than the expression in 
Worsley (1994); it is a single sum, whereas the the expression in Worslcy (1994) 
is a double sum. 
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Fig 4. Rejection region Rt of the F statistic F = {z\ + z^)/^/ with k = 2 and v = I. The 
purple axes are from -1 to 1. The cone generator U is blue, ConeiU) is transparent yellow. 
The rejection region for a threshold of t = 3/2 is red; the tube about the rejection region 
(radius r = 0.15J is transparent green. Both rejection region and tube are cut at Z2 > and 
I < 1/ \/3- We expand the probability of this tube as a power series in r; its coefficients are 
the EC densities we seek. 



A simple rearrangement of (20) yields the following equivalent representation 
of the EC densities of the F-statistic random field in terms of the EC densities 
of the T-statstic random field: 

^ ^ 3=0 

3.4- The independently normalized cone random field Tin 

It is slightly easier to work with Tin, since it more closely resembles F, so 
we tackle this ahead of Tlr. It should now be clear how to proceed: find the 
rejection region as a function of the n UGRF's; put a tube around with radius 
r; work out the probability content; differentiate d times to get the EC density. 
This sounds formidable, but it is in fact virtually identical to the case of the 
F-statistic presented above. For readers with good geometric intuition. Figure 5 
might help: it shows the simple case of the rejection region Rt = {Z : Tin > t} 
where k = 2 and i> = 1, and [/ is a quarter circle, as in Figure 2. 

Theorem 2. // Cone{U) is convex then the EC density of the independently 
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Fig 5. Rejection region Rt of the independently normalized test statistic Tjn for the same 
cone as in Figure 2 and the same z as in Figure 4- The cone edges and X2 are black. The 
threshold is t = \/3 and both the rejection region and tube are cut at zi ± Z2 > — \/2 and 
\Z3\ < 1/V3. 



normalized cone random field Tin is 



2 \ -i/2 



The EC densities are valid for d < i' + max{l{U), 1), where 1{U) is the dimension 
of the largest linear subspace in Cone{U). 

Remark: The representation (10) represents Tin as a patchwork mixture of 
j ■ Fj^v random fields with weights Pj{U). See Remark 2 after Theorem 1 for 
why Theorem 2 should not be surprising. For the case of non-convex Cone(f7), 
see Remark 1 after Theorem 1. 



imsart-generlc ver. 2011/11/15 file: cone-ims.tex date: July 18, 2012 



Taylor & Worsley/Detecting sparse cone alternatives 



18 



Proof: The same geometric argument that led to (18) leads to the following 
approximate equality 



In fact, {Z e Tube(i?t,r)} is contained within {x > T*} with the difference 
coming from points where T* and x are both near 0. If 1{U) > 1, the probability 
of this difference, as a function of the tube radius r, is of order 0(r'^'^^+''). If 
1{U) = 0, then similar arguments to those in Section 2.4 show that we need only 
worry about 0/0 when x > but is close to 0, that is, when its xi components 
are near and Xu is also near 0. The probability of this is of order 0{r'^'^^). 
Since we must have d < v+ma,x(l(U), 1) anyway to avoid 0/0, we can ignore this 
difference in either case, thus for our purposes we need only expand P(x > T*) 
as a power series in r. This computation is essentially identical to the case of the 
F-statistic where O*^"^ is replaced with a general U. Following the calculations 
preceding (20): 



To derive the EC densities in terms of F EC densities, simply use (8), (19) and 



{ZeTube(i?t,r)}~{x>T;} 



where 






(20): 



fc 



p(x>r;) 



E P,(c/)P(x, >t;) 



j=max(;('7),l) 




□ 



3. 5. The likelihood ratio cone random field Tlr 



Figure 6 illustrates the rejection region Rt of Tlr. 
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Fig 6. As for Figure 5, but for the likelihood ratio test statistic Tlr at a threshold t = 3, cut 
at \\z\ \ < 1; 4> = aiccos(t / \/n + t^) = n/6. 



Theorem 3. // Cone{U) is convex then the EC density of the likelihood ratio 
cone random field Tlr is 



t n-j 



The EC densities are valid for d < n. 

Remark: As for Tin, the representation (9) represents Tlr as a patchwork 
mixture of \J jnj (n — j) ■ Fj,n~j random fields with weights Pj{U). See Remark 
2 after Theorem 1 for why Theorem 3 should not be surprising. For the case of 
non-convex Cone(J7), see Remark 1 after Theorem 1. 

Proof: It is easier to transform to the equivalent correlation coefficient 

^ _ Jlr _ X _ u'Z 
v/^^TT^ \\Z\\ ueu \\Z\\ 

Then the rejection region C > c is simply a cone centered at the origin that inter- 



sects the unit sphere in a tube of geodesic radius <j) = arccos c = arccos(i/v^n+~i2) 
about U: 

Kt — { z : arccos max T--7 | < 
\u(^U \\z\ 
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When Cone(f7) is convex there is an exact expression for the probabihty content 
of a tube about a subset of the sphere, similar to (8) (Lin and Lindsay, 1997; 
Takemura and Kuriki, 1997): 



> c 



\Z 



n 

¥{Z e Rt) = ^Pj(C/)P (arccos(A/B~) < 0^ 



where Bj is a Beta random variable with parameters j/2, (n—j)/2 (with i?„ = 1 
with probability one). The restriction of Cone(J7) to a convex set is not neces- 
sary, as it was for x - the only requirement is that t must be sufficiently large 
(i.e. (p must be sufficiently small) so that the tube does not self- intersect. This 
phenomenon is similar to what occurs when establishing the accuracy of (11) 
for non-convex regions Cone(C/). If Cone(t7) is convex then t > suffices. 

The next step is to put a tube about the rejection region Rt- Provided r is 
sufficiently small, a (Euclidean) tube of radius r about Rt intersects the sphere 
of radius ||z|| in a spherical tube of geodesic radius 9 — arcsin(r/||2;||) about 
Rt. For fixed ||2;|| sufficiently large, Rt is already a spherical tube about H^Hf/, 
so the (Euclidean) tube about Rt is a spherical tube about \\z\\U of geodesic 
radius + 6: 

Tube(i?t, r) = I z : arccos | max tt-tt ] < 6 + 
I \^<^u\\z\\J- 

The part of the tube near the origin with small ||z|| may contain a "wedge" of 
the ball of radius r (see Figure 5(a)) that is the only part of the whole tube 
that contributes to the coefficient of r". As pointed out in Section 2.4, Tlr is 
only defined for d < D < n so we can ignore this. It therefore follows that it is 
sufficient for us to work with 

n 

F{Z e Tube(i?t,r)) = ^pj([/)P (a.rccos{y^j) < + +0(r"), (21) 
where 8 = arcsin(r/||Z||) is independent of Bj. The inequality in (21) is 



arccos( V^) - < 8 v^l - Bjc - a/bJVi - < 



.\z\y 

so that 



' (arccos(/B;) < + 8^ = P [ > Xn-j\j^^ " '^V ^ + ^ 



where Xj Xn-j are the square roots of independent x^ random variables with 
degrees of freedom indicated by their subscripts. Putting everything together, 
the EC density that we seek is the coefficient of r'^{2T:Y^'^ / d\ in 

P(Z e Tube(i?t,r)) = f]p,(C/)P ( > Xn- 

3 = 1 \ 
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Fig 7. The hemodynamic response function ho (left, dashed line) and the two extremes hg ± 
2ho (left, solid lines) convolved with the on- off painful heat stimulus g (right, dotted line) to 
give the "middle" of the cone u (right, dashed line) and the two cone edges, the regressors 
^1,2 = {ho it 2ho) -kg (right, solid lines). The on-off stimulus is repeated ten times, from to 
360 seconds. 



Since this expression is linear in the tube probabihties, we can differentiate im- 
mediately to arrive at the result we are looking for. □ 



4. Application 

Friman et al. (2003) and Calhoun ct al. (2004) proposed the cone and one-sided 
F-statistics for the detection of functional magnetic resonance (fMRI) activation 
in the presence of unknown delay in the hemodynamic response. We illustrate 
our methods with a re-analysis of the fMRI data from study an pain percep- 
tion that was used by Worsley and Taylor (2006). The data, fully described 
in Worsley ct al. (2002), consists of a time series of 3D fMRI images Z{s,t) 
at point s G in the brain at time r. The subject received an alternating 9 
second painful then neutral heat stimulus to the right calf, interspersed with 9 
seconds of rest, repeated 10 times. The mean of the fMRI data is modeled as 
the indicator for each stimulus ((/(t) = 1 if on, if not) convolved with a known 
hemodynamic response function (hrf) ho{T) that delays and disperses the stim- 
ulus by about 5.5 seconds (see Figure 7). Taking ^(t) as just the painful heat 
stimulus, we add this to a linear model for the fMRI data: 

Z{s,t) = {ho*g)iT)Pis) + a{s)e{s,T), 

where e(s, r) ~ N(0, 1). Our main interest is to detect regions of the brain that 
are 'activated' by the hot stimulus, that is, points s where /3(s) > 0. 

There is often some doubt about the 5.5 second delay of the hrf, so to allow for 
unknown delay, we shift /io(t) by an amount S{s) and add S{s) as a parameter 
to the hrf. To keep the linear model, we then approximate the shifted hrf by a 
Taylor series expansion in S{s) (Friston ct al., 1998): 



/i(r; 6is)) = ho{T - <5(s)) « /io(t) - (5(s)/^o(r). 
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The convolution of h{T;6{s)) with the stimulus g^r) is then roughly equivalent 
to adding the convolution of —ho^r) with the stimulus as an extra regressor to 
give the linear model: 

Z{s, t) = iho*g){T)Pis) - {ho * g)iT)Pis)S{s) + a(s)e(s, r). 

However the key ingredient in the model is that there is some structure to 
the coefficients dictated by the physical nature of the regressors. It is strongly 
suspected that /?(s) > and the shift is restricted to a range of known plausible 
values S{s) € [Ai,A2]. In our example, we take [Ai,A2] = [—2,2] seconds. It 
is easy to see that the restrictions specify a non-negative-coefficient regression 
model 

Z(s,t) = xi{t)Pi{s) + X2{t)/32{s) + a(s)e,(s,T), (3i{s) > 0,^2{s) > 0, 

with regressors xj — {h — Ajh) * g, j = 1, 2, illustrated in Figure 7. The model 
is sampled at n equal intervals over time and suppose for simplicity that the 
resulting observations are independent. Replacing dependence on r by vectors 
in M", the linear model is the same as (3) with m = 2: 

Zis)^xMs)+X2P2is) + cT{s)e{s), /?i(s)>0,/32(s)>0, (22) 

where e(s) is a vector of n iid stationary Gaussian random fields. This model 
(22) is of course a 2D {k = 2) cone alternative with cone angle 

a = arccos(x'ia;2/(||a:i|| -11x211)). (23) 

The cone intrinsic volumes are Co.iiU) — l,a, and the x weights are pi^2{U) = 
1/2, Q;/(27r). The "middle" of the cone is u = {xi + X2)/2, appropriately nor- 
malized, which of course corresponds to the unshifted model with S = 0. 

In practice our observations were temporally correlated and we added regres- 
sors to allow for the neutral heat stimulus and a cubic polynomial in the scan 
time to allow for drift, leaving n — 112 effectively independent observations sam- 
pled every 3 seconds. The resulting a, found by whitening the regressors and 
removing the effect of the added nuisance regressors before calculating (23), 
now depends on s since the temporal correlation depends on s. However a was 
remarkably constant across the brain, averaging at a = 1.06 ± 0.03 radians or 
60.9 ± 1.7°, so we take it as fixed at its mean value. 

The search region S is the entire brain. The error random fields ei{s) are not 
isotropic, so we must use Lipschitz-Killing curvatures of S instead of intrinsic 
volumes. The highest order term with d = D makes the largest contribution to 
the P- value approximation (11), and fortunately there is a very simple unbiased 
estimator for CoiS) (Worsley et al., 1999; Taylor and Worsley, 2007). At a 
particular voxel, let E be the n x 1 vector of least-squares residuals from (22), 
and let N = E/\\E\\. Let Q be the nxD matrix of their spatial nearest neighbor 
differences, that is, column d of Q is N{s2) — N{si) where si, S2 are neighbors 
on lattice axis d. Then the estimator of Cd{S) is 

-^15(5) =^det(Q'g)i/2, 
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Test statistic 



P = 0.05 threshold Detected volume (cc) 



(a) T-statistic, Ti 

(b) Cone statistic, Tlr ~ Tin 

(c) One-sided F-statistic, y'2_F+ 

(d) F-statistic, 



5.15 4.0 

5.44 4.3 

5.63 3.8 

5.80 2.9 



Table 1 



Test statistics, P = 0.05 thresholds, and volume of detected activation for the application in 
Figure 8, in order of increasing threshold. The cone statistic detects the most activation. 

where summation is taken over all voxels inside S (Worsley ct al., 1999; Taylor 
and Worsley, 2007). The result is €.3(3) = 8086, which is of course unitless. The 
lower order Lipschitz-Killing curvatures are much more difhcult to estimate, 
but they can be very accurately approximated by those of a ball with the same 
volume, that is with radius r = 12.5, to give £0.1,2(5') = 1, 47rr, 27rr^. 

We are now ready to use (11) to get approximate P-values for the maximum 
of our test statistic random fields. Since the degrees of freedom = 110 is so 
large, the two cone statistics were almost identical, so we only show results for 
the independently normalized cone statistic. The P — 0.05 thresholds are shown 
in Table 1. Note that the values of the statistics are increasing since the cone 
is getting larger, but of course the P = 0.05 thresholds are increasing as well 
to compensate for this. The net result is that the volume of detected activation 
due to the painful heat stimulus remains roughly the same. Interestingly, it is 
the cone statistic with delays in the range [—2,2] seconds that detects the most 
activation. This activation is shown in Figure 8 (left primary somatosensory 
area and left and right thalamus). 

The last question is which test is the most powerful. Worsley and Taylor 
(2006) gives a power comparison of the four tests that shows that if the true 
delay is in the range [—1, 1] seconds then the usual T-statistic Ti is the most 
powerful, but outside this range, the cone statistic is the most powerful. 

Appendix A: Intrinsic volume 

The d-dimensional intrinsic volume of a set 5 is a generalization of its volume to 
lower dimensional measures. The Z?-dimensional intrinsic volume of 5* C is 
its usual volume or Lebesgue measure, the {D — l)-dimensional intrinsic volume 
of S is half its surface area, and the 0-dimensional intrinsic volume is the Euler 
characteristic of S. The simplest definition is implicit, identifying the intrinsic 
volumes as coefficients in a certain polynomial. This definition comes from the 
Steiner-Weyl volume of tubes formula which states that if S has no concave 
'corners', then for r small enough 



where | • | denotes Lebesgue measure and ujd = tt^^^ /r{d/2 + 1) is the Lebesgue 
measure of the unit ball in M'*. 



D 




(24) 
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Fig 8. Detecting activation in fMRI data. Each image shows the search region (the brain, 
left front facing viewer) and a slice of the test statistic (color coded) thresholded at P = 0.05 
(red-pink blobs - see Table 1). The test statistics, in order of increasing threshold, are (a) the 
T-statistic Ti; (b) the cone statistic Tjn (indistinguishable from Tlj^ in this case); (c) the 
square root of twice the one-sided F-statistic y'2_F+/ (d) the square root of twice the F-statistic 



If S is bounded by a smooth hypersurface, so that there is a unique normal 
vector at each point on the boundary, then a more direct definition is as fohows. 
Let C(s) be the [D — \) y. [D — 1) inside curvature matrix at s e dS, the 
boundary of S. To compute the intrinsic volumes, we need the det-traces of a 
square matrix: for a, d x d symmetric matrix A, let detTj{A) denote the sum of 
the determinants of all j x j principal minors of A, so that detrii{A) = det(^), 
detri(A) = tr(^), and we define detro(A) = 1. Let ad = 27r''/2/r(d/2) be the 
{d — l)-dimensional Hausdorff (surface) measure of the unit {d — l)-sphere in 
M''. For d = 0, . . . , D — 1 the d-dimensional intrinsic volume of S is 

Cd{S) = — f detY D-i-d{C{s)}ds, 
an-d JdS 

and Cd{S) = \S\, the Lebesgue measure of S. Note that Cq{S) — (p{S) by the 
Gauss-Bonnet Theorem, and £d-i{S) is half the surface area of S. 

For the unit (/c — l)-sphere, C = ±I(^k-i)xik-i) on the outside/inside of O'''^^, 
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SO that 



\ d J ak-d dl] 
•Yen, and zero otherwise, d = 0, . . . , i 




(25) 
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